Cost Solution

Here's how I implemented MSE:

class MSE(Node):
    def __init__(self, y, a):
        """
        The mean squared error cost function.
        Should be used as the last node for a network.
        """
        # Call the base class' constructor.
        Node.__init__(self, [y, a])

    def forward(self):
        """
        Calculates the mean squared error.
        """
        # NOTE: We reshape these to avoid possible matrix/vector broadcast
        # errors.
        #
        # For example, if we subtract an array of shape (3,) from an array of shape
        # (3,1) we get an array of shape(3,3) as the result when we want
        # an array of shape (3,1) instead.
        #
        # Making both arrays (3,1) insures the result is (3,1) and does
        # an elementwise subtraction as expected.
        y = self.inbound_nodes[0].value.reshape(-1, 1)
        a = self.inbound_nodes[1].value.reshape(-1, 1)
        m = self.inbound_nodes[0].value.shape[0]

        diff = y - a
        self.value = np.mean(diff**2)

The math behind MSE reflects Equation (5), where y is target output and a is output computed by the neural network. We then square the difference diff**2, alternatively, this could be np.square(diff). Lastly we need to sum the squared differences and divide by the total number of examples m. This can be achieved in with np.mean or (1 /m) * np.sum(diff**2).

Note the order of y and a doesn't actually matter, we could switch them around (a - y) and get the same value.

Next Concept